Start with pprof heap and CPU profiles to find top allocators, then reduce allocations using sync.Pool, pre-allocated slices, and avoiding interface boxing in hot paths.
sync.Pool: reuse short-lived objects (byte buffers, encoder/decoder instances) across requests
Pre-allocate slices: make([]T, 0, knownSize) avoids repeated doublings
Avoid fmt.Sprintf in hot paths: use strings.Builder or strconv for string construction
Avoid interface{} boxing: passing concrete types to interface{} parameters allocates on heap
Flat data structures: avoid deep pointer graphs — GC must trace every pointer
Use GODEBUG=gctrace=1 to monitor GC frequency and pause times in production